首页> 外文OA文献 >Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
【2h】

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

机译:使用基于KLD的变换映射分析基于Hmm的语音合成的无监督跨语言说话者自适应

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the EMIME project, we developed a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrated two techniques into a single architecture: unsupervised adaptation for HMM-based TTS using word-based large-vocabulary continuous speech recognition, and cross-lingual speaker adaptation (CLSA) for HMM-based TTS. The CLSA is based on a state-level transform mapping learned using minimum Kullback–Leibler divergence between pairs of HMM states in the input and output languages. Thus, an unsupervised cross-lingual speaker adaptation system was developed. End-to-end speech-to-speech translation systems for four languages (English, Finnish, Mandarin, and Japanese) were constructed within this framework. In this paper, the English-to-Japanese adaptation is evaluated. Listening tests demonstrate that adapted voices sound more similar to a target speaker than average voices and that differences between supervised and unsupervised cross-lingual speaker adaptation are small. Calculating the KLD state-mapping on only the first 10 mel-cepstral coefficients leads to huge savings in computational costs, without any detrimental effect on the quality of the synthetic speech.
机译:在EMIME项目中,我们开发了一种移动设备,该设备执行个性化的语音到语音翻译,以使用户使用一种语言的语音输入产生另一种语言的语音输出,同时继续听起来像用户的语音。我们将两种技术集成到一个体系结构中:使用基于单词的大词汇量连续语音识别的基于HMM的TTS的无监督适应,以及基于HMM的TTS的跨语言说话者适应(CLSA)。 CLSA基于状态级别的转换映射,该映射使用输入和输出语言中的HMM状态对之间的最小Kullback-Leibler差异来学习。因此,开发了无监督的跨语言说话者自适应系统。在此框架内构建了四种语言(英语,芬兰语,普通话和日语)的端到端语音到语音翻译系统。在本文中,评估了英语到日语的适应性。听力测试表明,与普通语音相比,自适应语音听起来更类似于目标语音,并且有监督和无监督的跨语言说话者适应之间的差异很小。仅在前10个mel倒谱系数上计算KLD状态映射可以节省大量计算成本,而对合成语音的质量没有任何不利影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号